A Source Localization/Separation/Respatialization System Based on Unsupervised Classification of Interaural Cues
نویسندگان
چکیده
In this paper we propose a complete computational system for Auditory Scene Analysis. This time-frequency system localizes, separates, and spatializes an arbitrary number of audio sources given only binaural signals. The localization is based on recent research frameworks, where interaural level and time differences are combined to derive a confident direction of arrival (azimuth) at each frequency bin. Here, the power-weighted histogram constructed in the azimuth space is modeled as a Gaussian Mixture Model, whose parameter structure is revealed through a weighted Expectation Maximization. Afterwards, a bank of Gaussian spatial filters is configured automatically to extract the sources with significant energy accordingly to a posterior probability. In this frequency-domain framework, we also inverse a geometrical and physical head model to derive an algorithm that simulates a source as originating from any azimuth angle.
منابع مشابه
Source separation based on binaural cues and source model constraints
We describe a system for separating multiple sources from a two-channel recording based on interaural cues and known characteristics of the source signals. We combine a probabilistic model of the observed interaural level and phase differences with a prior model of the source statistics and derive an EM algorithm for finding the maximum likelihood parameters of the joint model. The system is ab...
متن کاملCombining localization cues and source model constraints for binaural source separation
We describe a system for separating multiple sources from a two-channel recording based on interaural cues and prior knowledge of the statistics of the underlying source signals. The proposed algorithm effectively combines information derived from low level perceptual cues, similar to those used by the human auditory system, with higher level information related to speaker identity. We combine ...
متن کاملA Latently Constrained Mixture Model for Audio Source Separation and Localization
We present a method for audio source separation and localization from binaural recordings. The method combines a new generative probabilistic model with time-frequency masking. We suggest that device-dependent relationships between point-source positions and interaural spectral cues may be learnt in order to constrain a mixture model. This allows to capture subtle separation and localization fe...
متن کاملLocalization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks
Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberations are present in the mixtures. In this paper, we propose a new stereo speech separation system where deep neural networks are used to generate soft T-F mask for separation. More specific...
متن کاملPerformance of Source Spatialization and Source Localization Algorithms Using Conjoint Models of Interaural Level and Time Cues
In this paper, we describe a head-model based on interaural cues (e.g. interaural level differences and interaural time differences). Based on this model, we proposed, in previous works, a binaural source spatialization method (SSPA), that we extended to a multispeaker spatialization technique that works on a speaker array in a pairwise motion (MSPA) [1], [2]. Here, we evaluate the spatializati...
متن کامل